Create a map of Edinburgh, split into its natural neighbourhoods. https://python-visualization.github.io/folium/quickstart.html This is a link for my use later when I want to colour code the map based on similarity.

https://geopandas.readthedocs.io/en/latest/gallery/polygon_plotting_with_folium.html example I used to help with the next cell.

The following must be entered to use the foursquare API. The results of this call have been saved in a file for future use to avoid repeated API calls.

The DataFrame is transformed into a GeoDataFrame and then is cleaned up by removing any venues that are not in the correct neighbourhood, as up until this point neighbourhoods have been assumed to be a large circle focused on their centres, for the purposes of the API call.

Define a function to find which neighbourhood the venue is really in.

Run the function for every row of the data frame and drop any rows where the neighbourhood label is not accurate in terms of the natural neighbourhood boundary.

Map all venues that are still being used.

Now to add a column with additional data about each venue.

The number of rows without price tier and likes is relatively small. Need to check if they are evenly or unevenly distributed.

The rows with missing data are pretty much evenly distributed between the neighbourhoods and so will be dropped from the dataframe.

Check the above has done the expected operation:

The following splits the price tier and likes column into separate columns.

Save data to avoid uneccessary calls to the API.

Start here to avoid calls to API

Remove neighbourhoods with less than 5 venues from the analysis.

Sum over each type of venue and only keep venues with more than 5 occurences.

Take mean of each column for each neighbourhood.

Add columns giving the top 3 venues for each Neighbourhood.

Perform clustering based on the proportion of venues of each type in each neighbourhood.